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Abstract 

While genomic erosion is common among intracellular symbionts, patterns of genome evolution in heritable extracellular 
endosymbionts remain elusive. We study vertically transmitted extracellular endosymbionts (Verminephrobacter, Betaproteo- 
bacteria) that form a beneficial, species-specific, and evolutionarily old (60-130 Myr) association with earthworms. We 
assembled a draft genome of Verminephrobacter aporrectodeae and compared it with the genomes of Verminephrobacter 
eiseniae and two nonsymbiotic close relatives (Acidovorax). Similar to V eiseniae, the V aporrectodeae genome was not 
markedly reduced in size and showed no A-T bias. We characterized the strength of purifying selection (© = 6N/6S) and codon 
usage bias in 876 orthologous genes. Symbiont genomes exhibited strong purifying selection (oo = 0.09 ± 0.07), although 
transition to symbiosis entailed relaxation of purifying selection as evidenced by 50% higher a> values and less codon usage bias 
in symbiont compared with reference genomes. Relaxation was not evenly distributed among functional gene categories but 
was overrepresented in genes involved in signal transduction and cell envelope biogenesis. The same gene categories also 
harbored instances of positive selection in the Verminephrobacter clade. In total, positive selection was detected in 89 genes, 
including also genes involved in DNA metabolism, tRNA modification, and TonB-dependent iron uptake, potentially highlighting 
functions important in symbiosis. Our results suggest that the transition to symbiosis was accompanied by molecular 
adaptation, while purifying selection was only moderately relaxed, despite the evolutionary age and stability of the host 
association. We hypothesize that biparental transmission of symbionts and rare genetic mixing during transmission can prevent 
genome erosion in heritable symbionts. 

Key words: symbiosis, evolution, nonsynonymous substitutions, purifying selection, positive selection, genome reduction, 
extracellular symbiont. 



Symbiotic associations between bacteria and animal hosts 
shape the ecology and evolution of both partners (Dubilier 
et al. 2008; Moran et al. 2008; Moya et al. 2008). Transition 
of free-living bacteria to obligate endosymbionts profoundly 
affects bacterial genome evolution (Moran et al. 2008; 



Moya et al. 2008; Bright and Bulgheresi 2010; Toft and 
Andersson 2010). Periodic population bottlenecks during 
vertical transmission magnify the effects of genetic drift. 
A life cycle spent in the stable and protected host environ- 
ment also reduces the scope for genetic mixing. These 
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Table 1 

Summary of the Lifestyle and Genomic Properties of the Four Compared Species 

Genomic Predicted Number 

Genome Size G + C Content of Protein-Coding 

Species Lifestyle (Mbp) (%) Genes GenBank Accession Number 

Aac Free living 5.3 68.5 4,709 CP000512 

Ajs 4.5 66.1 4,007 CP000539 

Vei Heritable symbiont 5.6 65.0 4,908 CP000542 

Vtu 4.7 65.0 3,788 AFAL00000000 

Data on the Aac, Ajs, and Vei genomes were obtained from GenBank. The Vtu genome size represents a minimum estimate based on the length of concatenated contigs (for 
details, see supplementary text, Supplementary Material online); its G + C content was calculated from a concatenation of the predicted genes. 



factors lessen the efficacy of natural selection and explain 
genomic erosion, that is, accelerated substitution rates, bi- 
ased nucleotide base composition, pseudogenization, and 
gene loss in symbiont genomes. Genome erosion is best 
documented for obligate intracellular endosymbionts of in- 
sects (Moran et al. 2008, 2009; Toft and Andersson 2010) 
and has become the paradigm for genome evolution of such 
heritable symbionts (Andersson and Kurland 1998). In con- 
trast, little (and conflicting) information is available for ex- 
tracellular bacterial endosymbionts (Dubilier et al. 2008); 
while those of stinkbugs show vertical transmission, cospe- 
ciation, and reductive genome evolution in line with intra- 
cellular insect endosymbionts (Hosokawa et al. 2006; 
Kikuchi et al. 2009), the genomes of gutless oligochaete 
symbionts are similar in size and G + C content to their 
free-living relatives; however, their mode of transmission 
has not been fully resolved (Woyke et al. 2006). 

The genome of the extracellular earthworm endosymbiont 
Verminephrobacter eiseniae (Pinel et al. 2008) shows no signs 
of genome reduction either (Pinel 2009). Verminephrobacter 
(Betaproteobacteria) are species specific, extracellular endo- 
symbionts of most lumbricid earthworm species (Lund et al. 
2010). They colonize the earthworm's excretion organs, ne- 
phridia (Schramm et al. 2003), and are transmitted vertically: 
symbionts are deposited along with eggs and sperm in the 
egg capsules (Davidson and Stahl 2006) and during embryo- 
genesis, selectively recruited by the earthworm (Davidson and 
Stahl 2008). Fitness experiments have shown that the sym- 
bionts increase the reproductive success of their host (Lund, 
Holmstrup, et al. 2010); the earthworm-Verm inephrobacter 
symbiosis is therefore considered mutually beneficial. Al- 
though host and symbiont can be separated under laboratory 
conditions (Lund, Holmstrup, etal. 2010), the host-associated 
lifestyle of Verminephrobacter is evolutionary stable and as 
old as that of many heritable intracellular endosymbionts 
of insects (Moran etal. 2008). Strict host fidelity and a pattern 
of cospeciation indicate an origin of the Verminephrobacter- 
earthworm symbiosis in the most recent common ancestor of 
lumbricid earthworms, 60-130 Myr (Lund et al. 2010). 

Here, we obtain a draft genome from a second Vermi- 
nephrobacter species (Lund et al. 2012). We contrast the 
strength of purifying selection and the level of positive 



selection in the two Verminephrobacter genomes with 
those found in two free-living bacteria from the sister genus 
Acidovorax to study how the ancient transition to a symbi- 
otic lifestyle affected genome evolution in these extracellu- 
lar heritable endosymbionts. 

Lack of Genome Erosion in 
Verminephrobacter Symbionts 

The draft genome assembly of Verminephrobacter aporrec- 
todeae subsp. tuberculatae strain At4 T (Vtu) consisted of 
1,082 unique contigs with an average read depth of 45x 
(see Materials and Methods and supplementary text, Sup- 
plementary Material online). The Vtu and the V. eiseniae 
strain EF01-2 7 (Vei) genomes are similar in size, gene, and 
G + C content to those of Acidovorax avena subsp. citrulli 
strain AAC00-1 (Aac) and Acidovorax sp. )S42 (Ajs) (table 1). 
By comparing the gene complement encoded by the Vtu 
genome against those of Vei, Aac, and Ajs, it is obvious that 
the size of the Vtu draft genome assembly (4.6 Mbp) under- 
estimates the actual genome size, which is likely close to the 
size of the Vei genome (supplementary text, Supplementary 
Material online). Thus the symbiont genomes have appar- 
ently escaped the size reduction and strong A-T bias typical 
of heritable symbiotic bacteria. Using reciprocal basic local 
alignment search tool (Blast), we identified a set of 1,038 
orthologous genes shared among the four genomes. Phylo- 
genetic analyses of various taxonomic marker genes consis- 
tently group all known Verminephrobacter species in 
a monophyletic clade with Acidovorax as a sister lineage 
(Pinel et al. 2008; Lund et al. 2010). We inferred by maxi- 
mum likelihood, a four-species unrooted phylogeny for each 
ortholog alignment. In 961 cases out of the 1 ,038, the spe- 
cies tree topology described above was strongly supported 
by an Akaike's Information Criterion (AIC) at least three units 
lower (better) than alternative topologies. The remaining 77 
orthologs were discarded from further analyses because 
they did not discriminate reliably between topologies (n 
= 65) or supported alternative topologies (n = 12). To fur- 
ther guard against spurious results due to saturation of nu- 
cleotide substitutions, we examined the total number of 
synonymous substitutions estimated on each ortholog 



308 Genome Biol. Evol. 4(3):307-31 5. doi:10.1093/gbe/evs014 Advance Access publication February 14, 2012 



Purifying Selection and Molecular Adaptation in the Genome of Verminephrobacter 



GBE 




0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 

dN/dS 



Fig. 1. — Intensity of purifying selection (d/V/dS) in symbiont (Vei, Vtu) and in nonsymbiont (Aac, Ajs) genomes. Distribution of 6N/6S in each 
species was inferred from the least diverged data set (n = 876 orthologs) and assuming branch model H1 (see supplementary fig. S1, Supplementary 
Material online). The empirical frequency distributions obtained for each species were smoothed using a Gaussian kernel method. Note that a few d/W 
d5 values larger than 1 were discarded before smoothing (3-7 data points excluded per genome). Genome average d/V/d5 estimates for each species 
are also indicated (colored bullet). Data from the literature are depicted in the lower part of the plot (vertical ticks) for comparison. We used reports of 
44 average d/V/d5 values based on at least 300 orthologs per set of genomes and classified genomes according to their lifestyle (see color legend). 



phylogeny (supplementary fig. S2, Supplementary Material 
online) and discarded every ortholog exhibiting >6 synon- 
ymous substitutions expected per site (summed over the en- 
tire tree). This left a set of n = 876 orthologs representing 
"least diverged" genes conforming to the species tree. 

With this set, we quantified divergence between the two 
symbiont genomes. Pairwise divergence between the Vei 
and Vtu genomes accounts for about one-third of the total 
divergence in each of the unrooted ortholog phylogenies 
with 1 and 0.1 expected synonymous and nonsynonymous 
substitution per site, respectively (mean d5 ± standard de- 
viation [SD] = 1.05 ± 0.33; mean d/V ± SD = 0.1 ± 0.05; 
supplementary fig. S2 and table S6, Supplementary Material 
online). We then quantified selective constraints in the sym- 
biont genomes by estimating the ratio of nonsynonymous to 
synonymous substitution rates cd = 6N/6S. Values of oo < 1 
indicate purifying selection, that is, most amino acid substi- 
tutions are deleterious and thus removed, ce> = 1 indicates 
the absence of constraints, whereas ce> >1 indicates that 
fixation of amino acid-replacing mutations is driven by pos- 
itive selection. We estimated a> on each terminal branch of 



the unrooted ortholog phylogenies (fig.1, branch model H1, 
supplementary fig. SI , Supplementary Material online). Me- 
dian co values were 0.04 and 0.06 in the Aac and Ajs ge- 
nomes, respectively, and 0.09 in both symbiont genomes. 
These values are within the range reported for free-living 
bacteria (0.02-0.09; fig. 1 ; Novichkov et al. 2009) and lower 
than in heritable endosymbionts and host-associated patho- 
gens (0.11-0.25; fig. 1; Kuwahara et al. 2008; Kuo et al. 
2009; Novichkov et al. 2009), indicating that Verminephro- 
bacter genomes still undergo fairly strong purifying selection. 

Relaxed Purifying Selection in 
Verminephrobacter Genomes 
Compared with Their Free-Living 
Relatives 

Despite substantial purifying selection in Verminephro- 
bacter, we found clear evidence for relaxed purifying selec- 
tion in the Vei and Vtu genomes relative to Acidovorax. The 
genome-wide distribution of a> was shifted toward higher 
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Fig. 2. — Genome-wide relaxation of purifying selection in symbiont genomes. {A) Comparison of the selective constraint (d/V/d5) in symbiont (Vei, 
Vtu) versus nonsymbiont (Aac, Ajs) branches of unrooted ortholog phylogenies. Each point represents the dN/dS value for each ortholog from the least 
diverged data set (n = 876). Values of d/V/d5 reported here are obtained using a model averaging procedure over the set of branch models (H0-H4; for 
details, see supplementary fig. S1 and text, Supplementary Material online). (B) Comparison of the intensity of codon usage bias (ENCp) in symbiont 
versus nonsymbiont genomes. The intensity of codon usage for each ortholog was computed as the average of the ENCp estimate for symbionts and 
nonsymbionts. 



values in the symbiont genomes (fig. 1), and 755 of the 876 
orthologs had a higher co (binomial test, P < 10~ 10 ) in the 
symbiont versus nonsymbiont terminal branches (fig. 2A). 
This finding is robust on three accounts. First, our analysis 
is based on a model-averaging procedure to ensure reliable 
estimation of co (see Materials and Methods). Second, the 
finding is not an artifact of a single symbiont genome 
(Vtu) being poorly sequenced and yielding apparent ele- 
vated nonsynonymous substitution rates (supplementary 
fig. S3, Supplementary Material online). Last, even when re- 
stricting our analysis to subsets of orthologs with d5 < 2.5 
(corresponding to —0.5 expected substitution per branch 
and site), the vast majority of orthologs exhibited inflated 
co values on the symbiont branches (supplementary fig. 
S4, Supplementary Material online). 

Codon usage bias analysis represents an independent way 
to document relaxed purifying selection (Hershberg and 
Petrov 2008). Less bias is expected when purifying selection 
fails to remove slightly deleterious mutations with suboptimal 
codon usage. We quantified codon usage bias in the four 
species through the effective number of codons (ENCp) used 
(Novembre 2002). ENCp is inversely proportional to the inten- 
sity of codon usage bias; high values therefore indicate 
relaxed purifying selection. In Vei and Vtu, 90% of the 876 or- 
thologs had higher ENCp values (binomial test, P < 10~ 10 ) 
(fig. 2B), providing additional evidence for relaxed purifying 
selection in the Verminephrobacter genomes. 

The degree of relaxation in purifying selection varies 
across different functional gene categories; this has been 
used to infer the role of genes in insect endosymbionts (Toft 
et al. 2009; Perez-Brocal et al. 201 1): greatest relaxation is 
expected for genes rendered obsolete upon transition to 
symbiosis, while strong purifying selection indicates essen- 
tial gene functions. Querying for orthologs with the highest 
absolute increase in co (Aco > 0.09) between symbiont and 
nonsymbiont lineages, we identified 102 orthologs that 



experienced the greatest relaxation of purifying selection 
in the two symbiont genomes (supplementary table S1 , Sup- 
plementary Material online). To ensure that large co values 
are not caused by erroneous recruitment of paralogs, we 
used BlastP searches to identify the number of per-genome 
paralogs for each of the 1 02 orthologs. More than 90 ortho- 
logs had no identifiable paralogs in their parent genomes 
(supplementary table S1, Supplementary Material online). 
In the case of paralogs present, we validated inferred orthol- 
ogies by phylogenetic reconstruction (supplementary fig. 
S6, Supplementary Material online). This analysis left 100 
orthologs that experienced the strongest relaxation of puri- 
fying selection in the symbiont lineages. We examined the 
predicted functional categories of these orthologs using 
Clusters of Orthologous Groups (COG)-based analysis and 
annotation (Tatusov et al. 2001) and found significant over- 
representation of COG categories T (signal transduction; bi- 
nomial test, P < 0.002) and M (cell envelope biogenesis/ 
outer membrane functions, P < 0.00005) compared with 
the set of 876 orthologs (fig. 3A). This suggests that a size- 
able proportion of the genes involved in signal transduction 
and surface structures became dispensable when the Vermi- 
nephrobacter lineage became symbiotic. Such genes govern 
interactions with the environment and likely experience dra- 
matic changes in selective pressure upon transition from 
a free-living to a symbiotic lifestyle; genes in the latter cat- 
egory are often disproportionately lost in heritable intracel- 
lular symbionts (Toft and Andersson 2010). In contrast, 
orthologs involved in translation (COG category J) and most 
notably in membrane transport (COG categories E, F, G, and 
P) were about 3-fold underrepresented (P < 0.0005), sug- 
gesting a key importance of uptake systems for the Vermi- 
nephrobacter-ea rthworm symbiosis. Few instances of 
strongly relaxed purifying selection occur in the otherwise 
abundant COG category E (amino acid transport and metab- 
olism, supplementary fig. S5, Supplementary Material 
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Fig. 3. — COG-based profiling of gene function. (A) Profiles of the least diverged set of orthologs shared by all four genomes (n = 876) and profiles 
of the subsets containing either instances of greatest relaxed purifying selection (n = 100) or most significant instances of positive selection in the 
symbiotic branches {Vei, Vtu; n = 89). (B) Profiles of genes unique to the nonsymbiont (Aac, Ajs; n = 441) or the symbiont genomes (n = 975) 
compared with the least diverged set of orthologs. 



online). This pattern is consistent with the hypothesized sym- 
biont function in nitrogen recycling for their host (Pandazis 
1931; Schramm et al. 2003). Manual annotation of the 
100 orthologs under most relaxed purifying selection in 
Vei and Vtu confirmed the COG-based analysis (supplemen- 
tary table S1, Supplementary Material online). Orthologs in- 
volved in biosynthesis of membrane lipoproteins and 
glycolipids, and in biosynthesis and degradation of peptido- 
glycan represented COG category M, while two-component 
systems and transcriptional regulators represented COG cat- 
egory T (supplementary table S 1 , Supplementary Material on- 
line). 

Footprints of Molecular Adaptation in 
the Symbiont Genomes 

Transition to symbiosis can be seen as colonization of a new 
ecological niche (the host) triggering episodes of adaptation 
detectable at the molecular level (Toft and Andersson 201 0). 
We used a conservative branch-site likelihood ratio test 



(Zhang et al. 2005) to identify orthologs with sites evolving 
under positive selection (a> > 1) in the symbiont terminal 
branches of the four-species phylogeny. To account for mul- 
tiple testing (n = 876 tests), we used a false discovery rate 
(FDR) approach and detected 90 orthologs evolving under 
positive selection in the symbiont genomes (P < 0.005, es- 
timated FDR < 0.03 among significant tests). Using the 
same threshold, we only detected 1 0 orthologs evolving un- 
der positive selection in the nonsymbiont genomes. Scruti- 
nizing the 90 ortholog sets for erroneous recruitment of 
paralogs (see above), we identified one set where paralogy 
may bias our inference (supplementary fig. S7, Supplemen- 
tary Material online), leaving a total of 89 orthologs of which 
68 have no paralogs in any of the four genomes (supple- 
mentary table S2, Supplementary Material online). Another 
concern is the quality of the draft genome (454 sequencing) 
data. However, for the 89 positively selected genes 1) the 
overall dN/dS values for Vtu and Vei are similar (supplemen- 
tary fig. S3, Supplementary Material online), 2) rates of ho- 
mopolymeric runs are not systematically higher in Vtu 
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relative to Vei or Acidovorax (supplementary table S8, Sup- 
plementary Material online), 3) gene length is virtually iden- 
tical in Vtu and Ve/ (supplementary table S8, Supplementary 
Material online), and 4) frame shifts are absent; we are thus 
confident that detection of positive selection is not biased by 
454 sequencing errors. 

The 89 orthologs were functionally broadly distributed, 
with COG categories T, M, L and H marginally overrepre- 
sented relative to the initial set of orthologs (fig. 3A). As dis- 
cussed above, genes involved in signal transduction (T) and 
outer membrane functions (M) govern interactions with the 
environment. While some of these genes may become ob- 
solete upon transition to a symbiotic lifestyle, others may 
become essential for symbiont-host interactions and un- 
dergo adaptive evolution. Fifteen of the 89 orthologs were 
also in the subset of 100 orthologs showing greatest relax- 
ation of purifying selection (supplementary tables S1 and S2, 
Supplementary Material online). Two nonexclusive pro- 
cesses can explain the overlap. First, strong positive selection 
on a gene will increase its average dN/dS in the symbionts 
clade possibly lumping it in the "most relaxed selection" cat- 
egory. Second, a gene could experience relaxed purifying 
selection after transition to symbiosis, which, in turn, may 
open new avenues for adaptation. 

The 89 orthologs were manually annotated (supplemen- 
tary table S2, Supplementary Material online). Orthologs 
with a predicted function in signal transduction (T) primarily 
comprised two-component systems, which have a key role 
in niche adaptation of bacteria (Aim et al. 2006). Orthologs 
with a predicted outer membrane function represented lipid 
A-synthase (set 923), lipid A-sugar transferases (sets 333, 
339), and lipoprotein acyl-transferases (sets 93, 95). Such 
genes often undergo positive selection in pathogenic bac- 
teria, where they are important, for example, for attach- 
ment and evasion of host defenses (Fitzpatrick et al. 
2005; Chen et al. 2006; Soyer et al. 2009). Apolipoprotein 
acyl-transferase Lnt (set 93) evolves under positive selection 
in uropathogenic Escherichia coli that, like Verminephro- 
bacter, colonize a urinary tract environment (Chen et al. 
2006). In Vibrio fischeri, the extracellular symbiont of the 
Euprymna squid light organ, lipid A-acylating enzymes 
are key players in colonization (Adin et al. 2008). One of 
the lipid A-acylating proteins of V fischeri (HtrB1, locus 
tag VF_A0687) is homologous (30% pairwise amino acid 
identity) to ortholog set 339. Like V fischeri, Verminephro- 
bacter colonize their hosts by recruitment from a mixed mi- 
crobial community, migration to a specific host tissue, and 
attachment (Davidson and Stahl 2008). Surface lipoproteins 
and glycolipids may thus be important for both symbiont- 
host recognition and for attachment. 

Several of the 89 orthologs inferred to undergo positive 
selection have a predicted function in peptidoglycan biosyn- 
thesis (sets 297, 304) and degradation (sets 1 55, 306, 326, 
688, and 1477) (supplementary table S2, Supplementary 



Material online). Peptidoglycan monomers mediate host 
interactions of pathogenic and symbiotic bacteria and are 
recognized by the innate immune system of animals 
(Cloud-Hansen et al. 2006, Moya et al. 2008). Accordingly, 
N-acetylmuramoyl-L-alanine amidase, a putative homolog 
of ortholog set 326, also undergoes positive selection in ur- 
opathogenic £ coli (Chen et al. 2006). Peptidoglycan constit- 
uents are also important for the V fischeri-Euprymna 
symbiosis (Adin etal. 2009, Altura etal. 2011). The V fischeri 
membrane transport protein AmpG (locus tag VF_0720) con- 
trols the extracellular concentration of peptidoglycan mono- 
mers released by hydrolytic enzymes (like orthologous set 306 
and 326), which in turn affects the morphogenesis of the Eu- 
prymna light organ. The Vei and Vtu orthologs of set 1386 
(supplementary table S1, Supplementary Material online) are 
AmpG homologs sharing 32% full-length amino acid identity 
with the V fischeri AmpG and may serve a similar function as 
AmpG in V fischeri. Further discussion of the functional sig- 
nificance of the orthologs evolving under positive selection 
in Vei and Vtu is included in the footnote of supplementary 
table S2, Supplementary Material online. 

Drivers for Genome Evolution in 
Extracellular Heritable Symbionts 

Unlike intracellular insect endosymbionts (Moran et al. 
2008) and stinkbug extracellular endosymbionts (Hosokawa 
et al. 2006; Kikuchi et al. 2009), the earthworm symbiont 
genomes are not eroded, despite their ancient association, 
host fidelity, and vertical transmission (Davidson and Stahl 
2006, 2008; Lund et al. 201 0). Lack of genomic synteny be- 
tween Vei and Aac/Ajs and a high number of mobile ele- 
ments in the Vei genome (Pinel 2009) may indicate that 
Verminephrobacter moves toward genome reduction ac- 
cording to the symbiont stage theory (Moran et al. 2009; 
Toft and Andersson 2010). However, age of the symbiosis 
(60-130 Myr), strong purifying selection, and low levels 
of pseudogenization (Pinel 2009) do not support this sce- 
nario. Furthermore, analysis of genes unique for the symbi- 
ont lineage (fig. 3B) suggests that Verminephrobacter have 
acquired novel genes after transition into symbiosis, most 
prominently coding for amino acid and carbohydrate trans- 
port, while genes coding for signal transduction are under- 
represented compared with the reference genomes. 

Our results show a more nuanced pattern of genome 
evolution in Verminephrobacter compared with heritable in- 
tracellular symbionts of insects, that is, the absence of ge- 
nome reduction and A-T bias, fairly strong purifying 
selection, pervasive molecular adaptation, and signs of gene 
acquisition after transition to symbiosis. We propose that 
this pattern of genome evolution is best explained by the 
life cycle of the symbionts (Bright and Bulgheresi 201 0). Ver- 
minephrobacter experience a short free-living stage during 
vertical transmission: the symbionts move to the worm 
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surface and are, together with other bacteria, enclosed in 
egg capsules, where they colonize the developing embryos 
(Davidson and Stahl 2006, 2008). It is possible that earth- 
worms exchange symbionts residing in their mucus during 
mating, leading to biparental inheritance. This mode of 
transmission may select against the loss of genes essential 
for surviving outside the host and may offer a window of op- 
portunity for recombination and lateral gene transfer, which 
will slow down or prevent Muller's ratchet (Felsenstein 1974; 
Moran 1 996). In contrast, vertical transmission of insect intra- 
cellular and stinkbug extracellular endosymbionts is uniparen- 
tal (Hosokawa et al. 2006; Moran et al. 2008) and stringent, 
effectively preventing genetic mixing. 

Material and Methods 

Organisms and Genome Sequencing 

The annotated genome sequences of Aac, Ajs, and Vei were 
downloaded from the NCBI website (table 1). A draft 
genome sequence of Vtu was obtained by pyrosequencing 
using the GS20 and GS20 FLX platforms (Roche 454 Life 
Sciences) as per the manufacturers recommendations. 
Genomic DNA was extracted from a Vtu culture grown in 
liquid R2A medium using the "DNeasy Blood and Tissue 
Kit" (Qiagen). A shotgun library was prepared from 50 jag 
DNA using the "Standard DNA Library Preparation Kit" 
(Roche). Nebulized, purified, and A and B adapter-linked 
single strand DNA fragments (average size 200-700 bp) 
were clonally amplified using the "Emulsion PCR Kit I" 
(Roche). Four independent GS20 runs and one GS20 FLX 
sequencing run were performed to produce 1,625,984 
reads with an average length of 96 bp (GS20) and 
149,562 reads with an average length of 242 bp (GS20 
FLX), together totaling 192,595,373 sequenced bases. 
De novo sequence assembly was performed using CLC Ge- 
nomics Workbench (version 4.0.2) generating 1 ,082 unique 
contigs (size range 200-81,277 bp, average size 4,326 bp) 
comprising 4,681,801 bp. The Vtu draft genome sequence 
was assigned accession number AFAL00000000 in the 
GenBank WGS sequence repository. 

Putative protein-coding genes were identified with GLIM- 
MER v3.02 (Delcher et al. 2007) using default settings and 
a lower size cutoff of 150 bp. Predicted genes were vali- 
dated by homology searches using inferred amino acid se- 
quences as Blast search queries against release 45 of the 
NCBI refseq protein database (Pruitt et al. 2007). Hits with 
an e-value cutoff <1 x 10~ 20 were considered significant, 
leaving 3,788 predicted protein-coding genes (table 1). 

Identification of Orthologous Gene Sets 

Sets of orthologous genes shared by the four species were 
identified via amino acid sequence-based reciprocal Blast 
searches using an e-value cutoff of 1 x 1 0~ 20 . Inferred amino 



acid sequences of each identified gene set were aligned 
using ClustalW v1 .81 (Thompson et al. 1 994) and those sets 
comprised of members with less than 20% difference in se- 
quence length and that shared >35% sequence identity 
across their entire length were considered orthologous. 
DNA sequence-based codon alignments of the orthologous 
gene sets were constructed from the corresponding aligned 
amino acid sequences using PAL2NAL v12.2 (Suyama et al. 
2006). For functional assignment, orthologs were manually 
compared against InterPro (Zdobnov and Apweiler 2001) 
and COG (Tatusov et al. 2001) databases. Automated func- 
tional assignment was performed by rps-Blast (Marchler-Bauer 
et al. 2002) (standalone Blast release 2.2.24) against the Pre- 
formatted COG v.1 .0 database. The top e-value COG match 
was recorded for each query. 

Phylogenetic and Evolutionary Analyses 

A phylogenetic tree was inferred for each set of orthologs. 
As all sets comprise four species, three unrooted tree topol- 
ogies are possible: T1: {Aac, Ajs, [Vei, Vtu]), T2: {Aac, Vei, 
[Ajs, Vtu]), and T3: {Aac, Vtu, [Ajs, Vei]). T1 represents 
the canonical species trees. The likelihood of each topology 
was obtained under the general time reversible DNA substi- 
tution model using PhyML (Guindon and Gascuel 2003). AIC 
was used to compare the fit of alternative topologies. For 
each of the least diverged orthologs {n = 876), a set of al- 
ternative "branch models" (supplementary fig. S1, Supple- 
mentary Material online) was used to estimate cd and model 
changes in ce> along the underlying topology (here fixed to 
T1). The codeml program in the PAML package (wrapped 
using python scripts) was used to estimate jointly a> and 
the transition to transversion ratio using the "F3x4" model 
(Yang 2007). A model-averaging procedure was used for ro- 
bust estimation of a> values in the symbiont ("V") versus non- 
symbiont ("A") terminal branches of the T1 phylogeny. 
Briefly, for each ortholog, we obtained the AIC of 5 branch 
models (supplementary fig. S1 , Supplementary Material on- 
line) as AIC = -2 In L + 2p, where In/, denotes the log- 
likelihood of the data under the model and p the number 
of parameters fitted. We used differences in AIC (AAlCj) 
among the set of alternative branch models {HO, H1, H2, 
H3, H4} to calculate a set of weights {z 0 , z 1f z 2/ z 3 , z 4 } as 
z, = Exp(AAIC/)/[E/ Exp(AAICj)], where summation is over 
the set of five models. We then estimated a> A (respectively 
ce>v) as the weighted average of ce> values under each model 
(using z's as weights). This procedure guards against sensi- 
tivity of estimation of ce> values to a single branch model. We 
used the effective number of codons (ENC) and its extension 
to account for heterogeneity in nucleotide composition, ENC 
prime (ENCp), to quantify bias in codon usage (Novembre 
2002). ENCp values were estimated for each gene and 
species separately using the software ENCprime (version 
February 2006; Novembre 2002). Whenever needed, we 
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controlled for the FDR (Storey and Tibshirani 2003). All 
statistical analyses and model-averaging calculations were 
carried out in R (http://www.R-project.org). 

Supplementary Material 

Supplementary text, figures S1-S5, and tables S1-S8 are 
available at Genome Biology and Evolution online (http:// 
www.gbe.oxfordjournals.org/). 
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